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i-H We introduce and explore a new class of stationary time series models 

^H for variance matrices based on a constructive definition exploiting inverse 

^^ Wishart distribution theory. The main class of models explored is a novel 

class of stationary, first-order autoregressive (AR) processes on the cone of 

i— l positive semi-definite matrices. Aspects of the theory and structure of these 

^ new models for multivariate "volatility" processes are described in detail 

and exemplified. We then develop approaches to model fitting via Bayesian 
simulation-based computations, creating a custom filtering method that relies 
on an efficient innovations sampler. An example is then provided in analy- 
sis of a multivariate electroencephalogram (EEG) time series in neurological 
[j^\ studies. We conclude by discussing potential further developments of higher- 

b-H order AR models and a number of connections with prior approaches. 

+2 1. Introduction. Modeling the temporal dependence structure in a sequence 

of variance matrices is of increasing interest in multi- and matrix-variate time series 

analysis, with motivating applications in fields as diverse as econometrics, neu- 

roscience, epidemiology and spatial-temporal modeling. Some key interests and 

Q\ needs are in defining: (i) classes of stationary stochastic process models on the 

^ cone of symmetric, non-negative definite matrices that offer flexibility to model 

ly-N differing degrees of dependence structures as well as short-term predictive ability; 

■^ (ii) models that are open to theoretical study and interpretation; and (iii) models 

<3> generating some degree of analytic and computational tractability for model fitting 

and exploitation in applied work. 

The context is a sequence of q x q variance matrices (i.e., symmetric, non- 
# ^ negative definite matrices) T, t in discrete time t = 0, 1, ... , typically the variance 

^ matrices of components of more elaborate state- space models for an observable 
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time series. In econometrics and finance, variants of "observation-driven" multi- 
variate ARCH (Engle, 2002) models and state-space or "parameter-driven" multi- 
variate stochastic volatility models (Harvey et al., 1994; Quintana and West, 1987) 
are widely used. While the former directly specify "volatility" matrices T, t as func- 
tions of lagged values and past data, state-space approaches use formal stochas- 
tic process models that offer cleaner interpretation, access to theoretical under- 
standing as well as potential to scale more easily with dimension; see Chib et al. 
(2009) for a survey of such approaches. The class of state-space models based on 
Bayesian discount methods (Quintana and West, 1987; Quintana, 1992; Quintana 
et al., 2003, 2010; Uhlig, 1994; West and Harrison, 1997), are also widely used in 
financial applications for local volatility estimation and smoothing. These methods 
are, however, restricted to local estimation due to the underlying non- stationary 
random- walk style model for Y> t ] see Prado and West (2010) for recent review and 
additional developments. 

Two recent contributions explore constructions of AR(1) style models based 
on conditional Wishart transition distributions (Gourieroux et al., 2009; Philipov 
and Glickman, 2006a,b). These aim to provide flexibility in modeling one-step de- 
pendencies balanced with parsimony in parameterization through properties of the 
Wishart distribution. These models tend to be rather intractable theoretically, hence 
somewhat difficult to understand and interpret, while model fitting is challenging 
and there are open questions of how useful potential higher-order variants might 
be. We discuss these approaches and issues further in Section 8. 

The centrality of inverse Wishart theory to current Bayesian state-space ap- 
proaches underlies the ideas for new model classes explored in this paper. We in- 
troduce a class of stationary, non-linear autoregressive (AR) models for variance 
matrices by exploiting the structure of conditional and marginal distributions in 
the inverse Wishart family. We denote the resulting models by AR or IW-AR for 
definiteness, and use AR(1) or IW-AR(l) to be more specific about first-order mod- 
els when needed; most of the development of this paper is for first-order models. 
The new IW-AR models are open to some useful theoretical analysis of margins, 
stationarity, reversibility, and conditional moments, among other properties. Ex- 
ploiting the state-space nature of the IW-AR(l) process, we develop an MCMC 
sampler based on forward filtering backward sampling (FFBS) proposals that re- 
sults in tractable Bayesian computations. This operates locally on a matrix innova- 
tions process to ameliorate issues arising from global accept-rejects of the variance 
matrix process (e.g., exponential decrease in acceptance rates with increasing se- 
quence length) albeit at increased computational cost. 

Section 2 introduces the new models and some aspects of the theoretical struc- 
ture are explored in Section 3. Posterior computations are developed in Section 5, 
building on a data augmentation idea discussed in Section 4. An example in EEG 
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time series analysis is given in Section 6 and Section 7 discusses extensions to 
higher-order AR dependencies. Section 8 discusses connections with other ap- 
proaches, Section 9 provides summary comments and supporting technical ma- 
terial is appended. 

For time ranges we use the concise notation s : t to denote the sequence of time 
indices s, s + 1, . . . , t; e.g., S 0: r = {Eo, . . . , E T } and S t _i :t = {^t-u s t}- 

2. First-Order Inverse Wishart Autoregressive Processes. 

2.1. Construction. As context, suppose we are to observe a series of q x 1 
vector observations x t with 

(2.1) s t |Et~iV(0,Et), * = 1:T, 

where x t is independent of {x 5 , £ s ; s < £} conditional on T, t . We aim to cap- 
ture the volatility dynamics with a stationary, first-order Markov model for the T, t 
sequence. The joint density for matrices over an arbitrary time period t = : T is 

(2.2) p(Eo:T) - T — 

for some time invariant joint density p(Ej-i, S t ) for consecutive matrices in the 
numerator terms; this joint density has common margins given by the time invariant 
p(T>t) appearing in the denominator terms. 

We take the defining joint density p(Et_i, E t ) as arising from an inverse Wishart 
on an augmented state-space. Specifically, introduce random matrices fa such that 

for some degree of freedom parameter n>0,agxg variance matrix parameter S 
and agxg matrix parameter F such that the 2q x 2g parameter matrix parameter of 
the distribution above is non-negative definite. This inverse Wishart has common 
margin for the diagonal blocks; for each t, 

(2.4) E f ~IW g (n + 2,nS) 

with E[E t ] = S. It is now clear that eqn. (2.2) defines stationary first-order process 
with eqn. (2.4) as the stationary (marginal) distribution. Transitions are governed 
by the conditional density p(Et|Et-i) implicitly defined by eqn. (2.3). This has no 
closed analytic form but is now explored theoretically. 
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2.2. Innovations Process. The joint distribution of {St-i, Et, <fit} defined in 
eqn. (2.3) can be reformulated in terms of {Et_i, Tj, ^t} where {Tt,^} are 
marginally matrix normal, inverse Wishart distributed and independent of St-i- 
Specifically, standard inverse Wishart theory (e.g. Carvalho et al., 2007) implies 
that 

(2.5) E* = tf t + TtEt-iTj. 

and 0t = T^St-i where the q x g matrices {T t ^ t } follow 

(2.6) q \ 

T t \^ t ~N(F^ u (nS)- 1 ) 

with V = S — FSF f and where {T t ^t} are conditionally independent of Et_i ~ 
IW g (n + 2, nS). Eqn. (2.5) is an explicit AR(1) equation in which T t acts as a 
random autoregressive coefficient matrix and ^ t an additive random disturbance. 
Since {T t ,^ t } are independent at each t and drive the dynamics of this IW-AR 
process, we refer to them as latent innovations. 

2.3. Special Case of q = 1. When q = 1, E t = a t > and the IW-AR 
process reduces to an inverse gamma autoregressive process. Now S = s > and 
F = f e (— 1, 1) and the joint density of eqn. (2.3) is 

such that 

(2.8) <r t ~JG'' n + 2 n5 



2 ' 2 

Equivalently, with scalar innovations {Tt, *t} = { v t> ^t} 5 
(2.9) a t = i/; t + v 2 a t -\ 



where 



(2-io) ^-i G ( n -±^ n -^n) and Vt ^ t ^ N (fA 

We can see immediate analogies with the standard linear, Gaussian AR(1) process 
with a random AR coefficient. The marginal mean of v\ is (nf 2 + l)/(n+l) which 
plays the role of an average linear autoregressive coefficient. For |/| close to 1, the 
model approaches the stationary/non- stationary boundary, and when n is large, the 
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mean AR(1) coefficient is close to f 2 . Also, E[ifj t ] = ns(l — f 2 )/(n + 1) so that 
for fixed n and s the additive innovation noise tends to be smaller as |/| approaches 
unity. Parameters (n, /) also control dispersion of the additive innovations through, 
for example, V(ip t ) = 2[ns(l — / 2 )] 2 /[(n + l) 2 (n — 1)]. Section 3 further explores 
this in the general multivariate setting as well as this special case of q = 1. 

This inverse gamma autoregressive process is related to the formulation of Pitt 
et al. (2002). In that work, the authors construct a stationary autoregressive pro- 
cess {a t } with inverse gamma marginals by harnessing a conditionally gamma 
distributed latent process {z t }. The sequence {<j t , zt} obtained by generating a t | 
zt-i and zt \ cr t from the respective closed form conditional distributions leads 
to a marginal process {a t } with the desired autoregressive structure. Extensions 
to Bayesian nonparametric transition kernels is considered in Mena and Walker 
(2005) and to state-space volatility processes in Pitt and Walker (2005). Although 
related in spirit to this work, the proposed IW-AR process represents a novel con- 
struction. One attribute of the IW-AR approach, as explored in Section 3, is that our 
process need not be reversible depending upon the parameterization specified by 
/ and s. Furthermore, our formulation allows straightforward higher-order exten- 
sions, discussed in Section 7. Additional discussion and other related approaches 
appears in Section 8. 

3. Theoretical Properties. 

3.1. Marginal Processes for Submatrices and Univariate Elements. Consider 
any partition of X^ into blocks Ej^-, (i = 1 : /, j = 1 : J), where z, j represent 
consecutive blocks of consecutive sets of row and column indices, respectively. 
As a special case this also defines scalar elements. Then the evolution of each 
submatrix T,t,ij depends upon every element of Ej-i as follows: 



(3.1) E t>y = tf t ,y + [T t> ii ... T 



tAJ\ Zit-1 



Xi; 



t Uj 



Here ^ t ,ij ~ IW(n + g + 2, nVij) and \Yt,n • • • ^t,u] has a conditional matrix 
normal distribution induced from the joint distribution of eqn. (2.6). 

3.2. Stationarity. 

THEOREM 3.1. The process defined via eqn. (2.3) is strictly stationary when 
the parameterization of the inverse Wishart of yields a valid distribution: that is, 
when S and S — FSF' are positive definite. 
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PROOF. This follows directly from the constructive definition using eqn. (2.3). 
For a valid model, the scale matrix must be positive definite. Equivalently, via 
Sylvester's criterion and the Schur complement, S and S — FSF' must be positive 
definite. □ 

We note extensions to non-negative definite cases when the resulting T, t matri- 
ces are singular with singular inverse Wishart distributions, although these are of 
limited practical interest so we focus on non-singular cases throughout. 

In simple cases of F = pl q the stationarity condition reduces to \S\ > and 
\p\ < 1. Other special cases are those in which F, S share eigenvectors with eigen- 
decompositions F = ERE' and S = EQE' . Stationarity is assured when | \Q\ |o > 
Oand \\R\\o < 1. 

3.3. Reversibility. 

Theorem 3.2. The process is time-reversible if and only if FS — SF' . 

PROOF. The reverse-time process on the Y> t is as follows. Eqn. (2.3) implies 
that 

for some latent process (f) t > Then, as in Section 2, we have 

(3.3) Et_i = % + tot { 
where 

(3.4) V t - W q (n + q + 2, nV) and f t \ * t - N(F, * t , (nS)' 1 ) 

with V = S - SF'S^FS and F = SF'S- 1 . If FS = SF', then V = V and 
F = F and the reverse-time process follows the same model as the forward-time 
process. Conversely, assume a reversible process (i.e., F = F and V" = V) with 
FS 7^ SF' . Since F = SF' S~ x , a contradiction immediately arises. □ 



Examples of reversible IW-AR processes include cases when F = pl q or when 
F = ERE' and 5 = EQE'. Note, however, that the process is irreversible when 
F = diag(pi, . . . , p q ) with distinct elements and S is non-diagonal. 
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3.4. Conditional Mean. The IW-AR yields a simple form for the conditional 
expectation of E t given Ej-i- 

Theorem 3.3. 

(3.5) £[E t | St-i] = FYtt-rF' + c n , g [l + trjEt-i^)" 1 }] y 



wzY/z c 



n, g 



n/(n + q). 



Proof. See Appendix. 



□ 



Theorem 3.3 illuminates the inherent matrix linearity of the model and the in- 
terpretation of F as a "square root" AR parameter matrix. The conditional mean 
regression form is FT^-iF' corrected by a term that reflects the skewness of the 
conditional distribution. For large n, the underlying inverse Wishart distributions 
are less skewed and this latter term is small; indeed 



(3.6) 



lim E[X t | E t _i] = S + F(E t _i - S)F'. 



3.5. Principal Component IW-AR Processes. Assume that F and S share eigen- 
vectors so that the IW-AR model is reversible. There exists a principal component 
IW-AR process, as follows. 

THEOREM 3.4. Suppose that F = ERE' and S = EQE f where E is orthog- 
onal and R — diag(pi, . . . , p q ) and Q = diag(£i, . . . , £ q ) with positive elements. 
Then the sequence of matrices defined by E^ = E f Tj t E follows an IW-AR(l) model 
with degrees of freedom n, scale matrix Q andAR matrix R. Specifically, 

tt = * t + f t tt-iT { 

where i& t = E f ^ t E and T t = E f T t E are such that 

ift ~ IW q (n + q + 2, nQ(I - R 2 )) and t t \ % ~ N(R, * t , (nQ)' 1 ) 

and with marginal distribution E^ ~ IW q (n + 2, nQ). 
The conditional moment is given by 



(3.7) E[H t |E t _i]=i2E t _i/J + c 



n,q 



i + X)K0 _1 St-i (i 



Q(I-R 2 ). 



eqn. 3. 7 implies that there exists a zero-mean noise process vtj such that 



(3.8) Eijj - PjT, t -ljj + C ntq 



i + E(^) _1 ^-i. 



(1 - fi)tj + Vt. 



Autoregressive processes for the other terms ofYt t are similarly defined. 
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Proof. See Appendix. □ 

3.6. Exponential Forgetting. It is also interesting to examine properties of the 
IW-AR process as a function of the parameters F, S, and n and an initial value 
Ho. The mean of the IW-AR process forgets its initial condition exponentially fast 
under a wide range of conditions on the parameterization. 

Theorem 3.5. Assuming that \\SWoq, US' -1 Hoc, and ||i^||oo are each bounded 
by some finite A E R, then 

(3.9) E[X t | E ] = F%f'* + c n , q (S - F'SF 1 ) + 0{n~ l l • l') 
where c n?(? = n/(n + q) and 1 denotes a column vector of ones. Further, 

(3.10) lim #[£ t | S ] = S + F* (E - S)^*. 

n— >-oo 

Proof. See Appendix. D 

From eqn. (3.10), based on fixed S and in the limit as n — )► oo, we can directly 
analyze the effects of the elements of F on the conditional mean. 

Unitary Bounded Spectral Radius of F. If we assume that F has spectral radius 
p{F) < 1 (i.e., the magnitude of the largest eigenvalue of F is less than 1), The- 
orem 3.5 implies that forn —¥ oo the conditional mean goes exponentially fast to 
the marginal mean S, with a rate proportional p(F). 

Univariate Process (q = 1). In the univariate case we can analytically examine 
the conditional mean E[cr t | 0"o] without relying on the limit of n — >> oo. Using 
the notation of Section 2.3 and recursing on the form of E[a t \ <Jt-i] specified in 
Theorem 3.3, 

Since we assume |/| < 1 this becomes 

(3.12) £7[<7 t | a ] = s + ( ^^ J K - s). 

This has the form of a linear AR(1) model with AR parameter (nf 2 + l)/(n + 
1). Then lim^oo F[crt | co] = s when |/| < 1, and does so exponentially fast 
regardless of n. However, the overall rate of this exponential forgetting is governed 
by the AR parameter (nf 2 + l)/(n + 1). 
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Shared Eigenvectors Between F, S and So. In Theorem 3.4, we examined a prin- 
cipal component IW-AR process based on the shared eigenvectors of F and S. If 
we further assume that So shares eigenvectors with F and S, the following the- 
orem shows that T, t = E r Yi t E remains diagonal in expectation and has a closed 
form mean recursion. 

Theorem 3.6. Under the conditions of Theorem 3.4 assume So = EQqE' 
where Oo is diagonal, Oo = diag(#o)/#r some q— vector 6q with positive elements. 
Let 9 t \Q denote the q— vector of the eigenvalues of E[S t | So], £ = diag(Q) and 
£_i = diag (Q _1 ). Then E[Et I ^o] has the form of a first-order, non-diagonal 
autoregression on9 t \ 

(3.13) 6 m = c n , q (I - R 2 )£ + [n- 1 ^! - R 2 )^'-i + R 2 } t -i\0, 



or 



(3.14) e t{0 = B% + J2B T 



t-i 

a 

T=0 



where a = c n ^ q (I — R 2 )i and B — n 1 c ri:q (I — R 2 )^ f _i + R 2 . Assuming a 
stationary process such that \\R\\o < 1, 

(3.15) 6 t]0 = B t e + (I-B)- 1 (I-B t )a, 
and 

(3.16) lim 9 t \ = e 

That is, the eigenvalues of the limiting conditional mean are exactly those of the 
marginal mean S. 

Proof. See Appendix. □ 

Recalling that Q t \ fully determines E[S t | So] in the case of shared eigenvec- 
tors, we once again conclude that the conditional mean of the process forgets the 
initial condition Sq exponentially fast — this occurs irregardless of the value of n. 

4. Data Augmentation. Augmentation of the observation model eqn. (2.1) 
provides interpretation of the latent innovations process {T^, ^ t } as well as form- 
ing central and critical theoretical development for posterior computations as de- 
tailed in Section 5. Conditional on Sq and the innovations sequence, the observa- 
tion model can be regarded as arising by marginalization over an inherent latent 
q— vector process zt, (t = 1, . . .), where 

(4.1) x t \zt~ N(T t zt, *t) and z t ~ N(0, S t _i) 



10 
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Fig 1. Representation of the graphical model of the IW-AR(l) process under augmentation by the 
latent z t . Circles indicate random variables, arrows imply probabilistic conditional relationships 
while squares represent quantities that are deterministic based on an instantiation of the variables 
in their parents nodes. 

independently over time. That is, the observations x t are from a conditionally lin- 
ear model with latent covariate vectors z t and regression parameters {T^, \E^}. The 
normal-inverse Wishart prior for {T t ^ t } provides a conjugate prior in this stan- 
dard multivariate regression framework. See Figure 1 for a graphical model repre- 
sentation of this process. 

Let y t = [z' t x' t }' and A t = {E t _i, T t , %}. Then 



(4.2) 



p(y 1:T | Ai :T ) = Y[N(x t | r t zt,*t)N(z t | 0,E t _i) 
t=i 

T 

p(A 1:T ) = P(S | n, S) H NIW q (T u % \ F, n, S) 



t=i 



where NIW denotes the matrix normal, inverse Wishart prior on {T t ^t} of 
eqn. (2.6). We omit the dependency of the left hand side on the hyperparameters n, 
F, and S for notational simplicity. Figure 2 displays the resulting graphical model, 
clearly illustrating the simplified conditional independence structure that enables 
computation as developed below. Note that A t plays the role of an augmented state 
and the evolution to time t defines X^ as a deterministic function of this state. 

5. Model Fitting via MCMC. For model fitting, we develop a Markov chain 
Monte Carlo (MCMC) sampler that harnesses the simplified state-space structure 
of the augmented model comprised of Gaussian observations with an IW-AR pro- 
cess. This structure (Figure 2) immediately suggests a natural MCMC sampler that 
iterates between the following steps: 
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FIG 2. Representation of the graphical model for the augmented data yt = [z' t x' t ]' and latent states 
A t = {Et-i,T t ,*t}. 

Step 1. Impute the latent process by sampling each zt from 

p(zt | x t , T t , V t , St_i) oc N(z t | 0, Z t -i)N(x t \ T t z t , V t ) 

(5.i) = N( Zt | (e-_\ + t;*- 1 ^)- 1 ^*- 1 ^, (s-_\ + t;*- 1 ^)- 1 ) 

Step 2. Update the hyperparameters S and F conditioned on x\ : t and z± : t by 
sampling steps defined in Section 5.2 below. 

Step 3. Impute the augmented variance matrix states using a Metropolis-Hastings 
approach targeting the conditional posterior 

(5.2) p(A 1:T \x 1:T ,z 1:Tj F,S). 

We do this using an approximate forward filtering, backward sampling (FFBS) 
algorithm to define proposal distributions; see Section 5.1 below. 

Note that Step 2 and Step 3 comprise a block- sampling of the IW-AR hyperpa- 
rameters {F, S} and the augmented process Ai : j- conditioned on x\ : t and z\ : t- 
This greatly improves efficiency relative to a sampler that iterates between (i) sam- 
pling Ai : t given F, S, x\ : t. and z\ : t and (ii) sampling {F, S} given Ai : j- (which 
is then conditionally independent of x\^. and z\ : t). 

5.1. Forward Filtering, Backward Sampling. We utilize the fact that there is a 
deterministic mapping from A* to the augmented matrix 

(5 " 3) At "> V TtEt-i Et 

and thus use the two interchangeably. Our goal is to develop an approximate for- 
ward filtering algorithm that produces an approximation to p(Ai : x | J/i : t)> which 
can then be used in backward-sampling a posterior sequence So : t- We examine the 
filtering and sampling stages in turn. 
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Approximate Forward Filtering. An exact forward filtering would involve recur- 
sively updating p(A t \ yi :t -i) to p(A t \ y 1:t ) and propagating p(A t \ yv.t) to 
p(A t+ i | yi-.t). However, as examined in the Appendix, this filter is analytically 
intractable for the IW-AR so we use an approximate filtering procedure based on 
moment-matching in order to maintain inverse Wishart approximations to each 
propagate and update step. Specifically, let gt-i\t-i(^t-i | yi-.t-i) denote the ap- 
proximation to the posterior p(A t _i | yi-.t-i) at time t — 1. We then approximate 
the predictive distribution p( A t \ yi-.t-i) by 

(5.4) ft | t _i(A t | yv.t-i) = IW (r u (r t - 2)E 9t _ 1]t _ 1 [A t \ y^]) . 

Here, r t is a specified degree of freedom to use in the approximation at time t. 
The subsequent update step of incorporating observation y t is exact based on the 
approximations made so far. Namely, 

(5.5) 9tlt (A t | y 1:t ) = IW (r t + 1, (r t - 2)E 9t _ Mt _ 1 [A t \ y 1:t _ 1 ] + y t y^ . 
The required expectation here is easily seen to be 

(5-6) E 9t _ 1]t _ 1 [A t \y 1 .,. l \ 

St-i St-iF 

FSt-i FSt-!F' + cn 9q (l + tr^.!^)- 1 )^) 

where the S t sequence is updated using the identity 

(5.7) (r t - l)S t = (r t - 2) {FSt-iF' + c^(l + triSt^inS)- 1 )^} + x t x f t 

as further detailed in the Appendix. 

In summary, the approximate forward filtering is defined by a recursion in which 
the inverse Wishart distribution for A^_i is converted into a predictive distribu- 
tion for A t . This predictive distribution is also taken to be inverse Wishart degree 
of freedom r t and mean E 9t _ 1|t _ ± [A t \ yi-.t-i] - i.e., the predictive mean under 
the distribution g t _ 1 \ t _ 1 and the dynamics specified by the IW-AR prior. The in- 
verse Wishart predictive distribution is then directly updated to the resulting inverse 
Wishart distribution for A^. 

Backward Sampling. Running these forward filtering computations to time t = T 
yields # T |t(At | y 1:T ) approximating the true posterior p(At \ yi-.r)- We use the 
sequence of approximations g t \t(^t \ yv.t) in deriving a backwards sampling stage, 
which we show is exact based on the approximations made in the forward filtering. 
At time t = T, we sample H^ from the implied approximate posterior margin 

(5.8) 
S T rsj # T | T (£ T |yi :T ) = IW(r T + 1, [r T - 2)E gT _ 1{T _ 1 pr|yi:T-i] + x T x T ). 
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Fig 3. Graphical model of the reverse time inverse Wishart autoregressive process where ^t ~ 
IW(n + q + 2,71V) and f t | f t ~ iV(F, §t, (nS) _1 ) wif/i F = SF f S~^ and V = 
S — SF' S~ 1 FS. The reverse time transitions are defined by St-i = \Pt + TtStTj. S/«c£ one 
can deterministic ally compute {Tt, ^t} from At = {Tt, ^t, St} (s££ ^w. (5.12)), fe augmented 
observation y t are conditionally independent given Ai : t. 

We then harness the reverse time process, depicted in Figure 3. As we iterate back- 
wards in time, we condition on the previously sampled T, t to sample Ht-i as fol- 
lows. We first sample 



(5.9) 

and then set 
(5.10) 



tft^IWfa + l + ^G^-Gf (G: 



^Gf) 



22\ 



Et_i = % + T t E t T{. 

Here, G t = (r t - 2)£7 flt _ 1|t _ 1 [A t \vi*-i] + VtVt> with C t n , Gf, Gf denoting the 
three unique q x q sub-blocks (G\ 2 = Gf- ). These terms, which can be regarded 
as sufficient statistics of the forward filtering procedure, can be written as 

Gl 1 = (r t - 2)S t -i + zt4 
(5.11) Gf = (r t - 2)FS t -i + x t z' t 

Gf = (r t - l)S t , 

with St recursively defined as in (5.7). In practice, conditioned on {n, £, F}, the 
sequence SW is precomputed and simply accessed in the backward sampling of 

E():T- 

Note that if we wish to impute {T t ,^ t },we can deterministically compute them 
based on the sampled {Et_i, St, Tt, ^t}; that is, 



(5.12) 



^ = E t t ;s-_\ 



and * t = E t - T t Et_iT{. 
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Accept-Reject Calculation. We use the proposed approximate FFBS scheme as 
a proposal distribution for a Metropolis Hastings stage. Let q(- | •) represent the 
proposal distribution for {St,Ti : t,*i : t} implied by the sequence of forward 
filtering approximations gt\t(^t \ Vi-.t)- For every proposed A*. T , we compare the 
ratio 

(5.13) r(A * :T) _M^t* :T ^* :T |, 1:T ) 



g(£^T* :T ,** :T |y 1:T ) 

to r(Ai : r), where Ai : j- is the previous sample of the augmented sequence. If 
r(A*. T ) > r(Ai : r), we accept the proposed sequence. Otherwise, we accept the 
sequence with probability r(A^. T )/r(Ai : T). 

The accept-reject ratio is calculated as follows. Noting that there is a one-to-one 
mapping between {Ht, ^i:T, *i : t} an d {^o, T i : t, ^i : t}> 

p(E T , Ti : t, *1:T | J/1:t) p(j/l:T | ^0, Ti :T , *1:t)p(£t, Tl:T, *1 : t) 

(j.14) = - oc 



g(E T , Ti:T, *1:T I 2/1:t) ffC^Tj ^1:T, *1:T | V1:t) 

The augmented data likelihood is given by 

T 

(5.15) p(y 1:T | S , T 1:T , * 1:T ) = "[I N(x t | T t z u %)N(z t | 0, £ t _i). 

t=i 

As specified in eqn. (3.4), the prior of the reverse time process is given by 

(5.16) p(Er, f i :T , * 1:T ) = IW(S T | n + 2, nS) 

T 

x f] IW(* t I n + ? + 2, n^)iV(f t | F, %, (nS)- 1 ) 
t=i 

Similarly, the proposal density decomposes as 



(5.17) g(Er, f i :T , §1:T | 2/1:t) = IW(S T | T T + 1, Gf ) f] / t (f t)^(*t | t t ) 

where 

/ t (f t ) = IW(* t I r t + 9 + 1, G t n - Gf\GfY x Gf) 



and 

M#* I %) = ^(f t I Gf '(Gf )" 1 , * t , (G? 2 )" 1 ) 
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FFBS Computations. One important note is that the acceptance rate decreases 
exponentially fast with the length of the time series, as with all Metropolis-based 
samplers for sequences of states in hidden Markov models. Recall that the proposed 
Hq. t sequence is based on a sample H^ from g T \ T and a collection of T indepen- 
dent samples {T^, ^ } from distributions based on g t \ t . If the final approximate 
filtered distribution g T \ T is a poor approximation to the true distribution, then the 
collection of T independent proposed innovations {T^^} are unlikely to re- 
sult in a Hq. t that explains the data well. The accuracy of the approximation g T \ T 
decreases with T. Furthermore, even if g T \ T is a good approximation to the true 
posterior, a single poor innovations sample {T^, ^} can be detrimental since the 
effects propagate in defining Y^ ):t _ 1 . The chance of obtaining an unlikely sample 
{Y^, \E^} for some t increases with T. 

Since the distributions contributing to the accept-reject ratio r (Ai : t) factor over 
t, one can sequentially compute and monitor this ratio based on the samples of 
H^ and {T£ T , ^. T }. One can then imagine harnessing ideas from randomness 
recycling (Fill and Huber, 2001) to improve efficiency by rejecting locally in- 
stead of rejecting an entire sample path from t = 0, . . . , T. Additionally, one 
could develop adaptive methods in which samples {Y^,^} leading to drastic 
declines in the acceptance-ratio-to-t were rejected and {Y^, *&%} was then resam- 
pled, but only for some finite period of adaptation. These ideas all focus on the 
ability to accept or reject entire sub-sequences {EJ., Y*. T , ^. T }, and require the- 
oretical analysis to justify convergence to the correct stationary distribution. Al- 
ternatively, we develop below an innovations-based sampling approach in which 
we fix {Ht, Ti : t-i } t+i:T 5 ^1:£-i,£+1:t} and simply consider accepting or rejecting 
{Y^, *£} at a single time step t. 

An Innovations-Based Sampler. We propose an innovations sampler that accepts 
or rejects {Y^, \E^} for every t instead of accepting or rejecting the entire chain 
Hq. t induced from the collection of these backwards innovations samples (and 
E50. Specifically, let 9 t = {f t , tfj and S t = {0 U 2 , . . . , T , S T }. For each t we 
propose 

(5.18) <S t * = {#i, #2, • • • , 0t-i, 0u 0*+i) • • • ? 0T? ^t}, 

with d\ = {Y^, ^t}. The accept-reject ratio based on eqn. (5.13) simplifies to 

K A l:r) = PJ S t 1 V1:t) g(gj I V1:t) 
r(A 1:T ) q{S% | J/i :T ) p(S t I J/1:t) 

= P(Vl:T I So,T|. t ,^|. t ,T t+ i :T , ^+1^)^*) g(5 t I V1:t) 
P(Vl:T | So,Ti :T ,*1:t) p(S t ) q(S } 



J t 



yv.T) 



UUl^jXr | Y*z r ^*)iV(z r 1 g^)g^) gt (fl t 1 y 1;r ) 
ITr=i W(*r I T T z r ,* T )JV(z r | E T _i)ft(e t ) qt(9t | 2/l:T) 
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Here pt(@t) denotes the matrix normal, inverse Wishart prior on the backwards 
innovations 9 t = {T^, ^t} and q t (9t \ Vi-.t) the corresponding distribution under 
the forward-filtering based proposal. That is, p t {-) and q t (- \ yi-.r) represent the 
time t components of eqns. (5.16) and (5.17), respectively. We utilize the fact that 
the prior, proposal and likelihood terms all factor over t. For the prior and proposal, 
the only terms that differ between the proposed sequence S% and the previous St 
are the backwards innovations at time t (i.e, 9 t ). For the likelihood, the effects of 
the change in backwards innovations at time t propagate to the forward parameters 
{So, T*. t , ^* :t } while leaving {T t+ i :T , ^t+i-.r} unchanged. See Figure 3. 

Note that the proposed innovations sampler is quite computationally intensive 
since the accept-reject ratio calculation for each proposed {T^, *&%} requires re- 
computing So:t-i- I n P rac tice, we employ an approximate sampler that harnesses 
the fact that the effects of a given {T^, \E^} on T^_ T decreases in expectation as 
r increases. That is, {T^, ^} represents a stochastic input whose effect is prop- 
agated through a stable dynamical system (assuming the F has spectral norm less 
than 1). In particular, we only calculate T^_ T for r increasing until the Frobenius 
norm | |HJl r — S t _ r 1 12 < e for some pre- specified, small value e > 0. That is, we 
propagate the effects of {T£, *£} until the value of I^_ r becomes nearly indistin- 
guishable numerically from the previous Et_ r . Alternatively, in order to maintain 
a constant per-sample computational complexity, one can specify a fixed lag r 
based on F and n since these hyperparameters determine (in expectation) the rate 
at which the effects of the proposed {Y^, *&%} decay. 

In contrast to sequential sampling of { Y^, \E^} for t = T, . . . , 1, one can imagine 
focusing on regions where the current X^ is "poor", where "poor" is determined by 
some specified metric (e.g., the likelihood function N(x t | 0, St)). As long as there 
is still positive probability of considering any t £ {1, . . . , T}, the resulting sampler 
will converge to the correct stationary distribution. 

Finally, instead of always running an approximate forward filter and performing 
backward sampling (where the "backward sampling" can occur in any order based 
on the innovations representation), one could run a backward filter and perform 
forward sampling (BFFS), exploiting the theory of the reverse-time IW-AR pro- 
cess. By interchanging FFBS with BFFS, the errors aggregated during filtering and 
the uncertainty inherent at the filter's starting point alternate from t = to t = T, 
thus producing samples closer to the values that would be obtained if smoothing 
were analytically feasible. 

5.2. Hyperparameters. In sampling the IW-AR hyperparameters F and S, we 
need to ensure that V = S — FSF' remains positive definite. Section 3.2 explored 
two cases in which simple constraints on F imply V positive definite for S positive 
definite: (i) F = pl q or (ii) F = ERE' with E the eigenvectors of S. A simple 



INVERSE WISHART PROCESSES 17 

framework for sampling the hyperparameters in this case is to propose S from a 
Wishart distribution, thus ensuring its positive-definiteness, and the eigenvalues of 
F from a beta proposal, thus ensuring spectral radius bounded by 1. The induced 
V will then be positive definite. One can also assume Wishart and beta priors. 

Both of the above specifications of F and S lead to reversible IW-AR processes. 
For a non-reversible IW-AR process (assuming S non-diagonal), we can take F — 
diag(p), implying V = (1 • V — p • //) o S where o denotes the Hadamard product. 
Note that even with \pi \ < 1 and S positive definite, V need not be positive definite. 
However, for V positive definite and \pi\ < 1, S will be positive definite and 
has elements simply defined by Sij = Vij/(1 — PiPj). Thus, in the case of F 
diagonal we sample F and V and then compute S from these values. Once again, 
we employ a beta prior on pi and a Wishart prior now on V. The details of the 
posterior computations for the case of F diagonal are outlined below. The case of 
F = ERE f and S = EQE f follows similarly. 

Let W(/uo, Vo) denote a Wishart prior for V and Beta(cpo,i, c(l — po,i)) a beta 
prior for pi. We use an independence chain sampler in which V* is proposed from 
a Wishart proposal W(^i, V\) and p\ from a beta proposal Beta(dpi^, d(\ — pi^)). 
The accept-reject ratio is then calculated based on the ratio 
(5.20) 
r(y*,F*) = p(x 1:T | z 1: t,F*,V*)p(z 1:T \ F*,V*)p(y*)p(F*)q(y)q(F) 

r(V, F) p(x 1:T | z 1:T , F, V)p(z 1:T \ F, V)p(V)p(F)q(V*)q(F*) ' 

Here, p(-) and q(-) denote the prior and proposal for the specified argument, re- 
spectively. The conditional likelihood p{x\ : T \ zi : t, F, V) and marginal likelihood 
p(zi:T I F, V) are derived below. We interchange {F, V } and {F, S} since there is 
a bijective mapping between the two when F is diagonal. 

Conditional Likelihood p(xi-T | zi : t,F,S). Since x t ~ N(T t z t ^t) with T^ | 
\& t rsj N(F, * t , (nSy 1 ) and ^ t ~ IW(n + q + 2, nV), marginalizing T t and * t 
yields a multivariate t distribution as detailed in the Appendix. This results in 

T 

(5.21) p(x 1:T | zi : t,F,S) = Y[t n+q+2 (x t | Fz t ,c n , q+2 {1 + z' t (nS)~ l z t }V) . 

t=i 

Marginal Likelihood p(z± : t \ F, S). Computing the marginal likelihood requires 
the evaluation of the analytically intractable integral 

(5.22) p(z 1:T | F, S) = f\[p{zt | S t _i)p(S :T I F, 5)dS 0: T. 

However, we can approximate the marginal likelihood by employing an approxi- 
mate filter in a manner analogous to that of Section 5.1. In particular, if we had 
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an exact filter that produced the predicted distribution p(T, t -i I zv.t-i) and the 
updated distribution p(E$_i | zi :t ), we could recursively compute the marginal 
likelihood as 

/<0 ox / x P( z t I S t -i)p(S t _i I z 1:t -i) 

(5.23) p(z 1:t ) = 7^ ; r P{zi:t-1) 

(here F and 5 are omitted for notational simplicity). Recall that p(z t | Et-i) = 
iV(^t | 0, St-i). Since exact filtering is not possible, we propose an approximate 
moment-matched filter using ideas parallel to those used for the FFBS approxima- 
tion to p(A t | yi:t). Specifically, 

(5.24) g tlt (X t | z 1:t ) = IW (r t , (r t - 2)E t , t ) 

(5.25) ft|t+i(E t | sia+i) = IW (r t + 1, (r t - 2)E t , t + Zt+i4+i) , 

with S t | t = E gt _ llt [I] t | zi : t]. The matched-means are recursively computed using 

(5.26) (r t -i - l)E t _!| t = (r t -i - 2)E t _ 1 , t _ 1 + ^ 

(5.27) E t , t = FE^F 7 + c n , g [l + tr{S t _!| t (ri^)- 1 }] , 

with initial condition E | = S. 

Using this approximate filter in eqn. (5.23) for the marginal likelihood and can- 
celing terms yields 

(5.28) 

m:i |^,^ 7 r^/ 2 l = l| (ri _ 1 _ 2) s t _ 1|t _ 1 + ^|(n- 1+ ,)/2 r( I ¥ i) • 

Note that we could have employed our filter for A t based on observations y t = 
[^ a^]' to produce an approximation to p(xi : t, z\ : t \ F, S). However, since we 
have an exact form for p(x\ : t \ zi : t, F, S) we choose to reduce the impact of our 
approximation by simply using it to compute p{z\^ \ F,S). 

We note further that it is straightforward to analyze p(F : S | Eo, Ti : t, *i : t)» 
suggesting that the MCMC use Gibbs sampler components for F, S that would 
avoid approximations. However, in practice we found use of this leads to extremely 
slow mixing rates relative to our proposed strategy above. 

6. Stochastic Volatility in Time Series. In this section, we consider a full 
analysis in which actual observations & are from a VAR(r) model whose innova- 
tions xt have IW-AR(l) volatility matrices. That is, we observe q— vector data & 
such that 

r 

(6.1) £ t = Y,Mt-i + xt, x t ~tf(0,Et), 
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where A{ is the q x q autoregressive parameter matrix at lag i and we now assume 
that S t is an IW-AR(l) process. Define A = [Ai • • • A r ] . 

We modify the MCMC of Section 5 as follows. In place of Step 1 that previously 
sampled z\ : t given Ai : j- and x\ : t, we now block sample {A, z\ : t} given Ai : r and 
fii-r;T> That is, we first sample A given Ai : t and ii- r -.T and then zi : t given A, 
Ai : t and Ci-r:T- Noting that xi : t is a deterministic function of £i_ r: T and the 
autoregressive matrix A, the latter step follows exactly as before. Step 2 and Ste/? J 
remain unchanged. Thus, the only modification to the sampler is the insertion of a 
Step to sample A given Ai : j- and £i- r: r- See the Appendix for further details. 

6.1. Example in Analysis of EEG Time Series. In multi-channel electroen- 
cephalogram (EEG) studies, multiple probes on the scalp of a patient undergo- 
ing an induced brain seizure generate electrical potential fluctuation signals that 
represent the spatially localized read-outs of the underlying signal (Krystal et al., 
1999a). Much of prior work with clinically relevant data sets has been on the eval- 
uation of time: frequency structure in such series (Freyermuth et al., 2010; Krystal 
et al., 2000; Ombao et al., 2005) and time- varying parameter vector autoregressions 
are key tools in this applied context, as in others (Prado and West, 1997, 2010; 
West et al., 1999). Existing models represent some aspects of cross-series structure 
in this inherently spatially distributed multiple time series context (Krystal et al., 
1999b; Prado et al., 2001), but past studies have shown substantial residual depen- 
dencies among estimated innovations processes across EEG probe locations and 
the implications for estimation of such structure in models that ignore significant 
patterns of time- varying cross-series correlations are largely unexplored. Hence it 
is of interest to explore models that use IW-AR models for multivariate volatility 
processes of innovations driving vector autoregressions as an obvious first-step. 

We explore one initial example using the model of eqn. (6.1). Define v t = 
^t 1/2 ^t so that 

r 

(6.2) v t = Y^ A t ,iVt-i + w u w t ~ N(0, 1) 

2=1 

— 1/2 

where the A t j = E t ' A^t-i are structured, time- varying AR parameter ma- 
trices for the transformed process. We can fit this model in the original form of 
eqn. (6.1) and this transformed series is then of interest as defining underlying in- 
dependent component series. 

An example data analysis uses q = 10 channels of a sub-sampled series of 1000 
time points, taken from the larger data set of West et al. (1999). The original series 
were collected at a rate of 256/second and these are down-sampled by a factor of 
2 here to yield T = 1000 observations over roughly 8 seconds. The data were first 
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standardized over a significantly longer time window, and the selected 8 second 
section of data corresponds to a recording period containing abnormal neuronal 
activity and thus increased changes in volatility. The example sets r = 8 and uses 
underlying diagonal autoregressive matrices Ai = diag(a^) with independent and 
relatively diffuse priors a\ ~ iV(0, 10/ g ). 

For the IW-AR model component, we assume the rather general, irreversible 
IW-AR process with a diagonal F = diag(p) and set n = 6. We specify priors on 
p and S based on an exploratory analysis of an earlier held-out section of the time 
series, £j. T , also of length 1000. Specifically, this was based on estimating innova- 
tions x\. T from q separate, univariate TVAR models as in Krystal et al. (1999b). 
Treating these constructed zero-mean series as raw data, the standard variance ma- 
trix discounting method (Prado and West, 2010) was applied using an initial 20 
degrees of freedom and a discount factor (3 = 0.95 to generate 100 independent 
posterior samples of the series of q x q variance matrices, say Uo-.t, across this 
prior, hold-out period. We then applied individual univariate IW-AR(l) models - 
the inverse gamma processes of Section 2.3 - to each of the diagonal data sets Ua^. 
From these, we extracted summary information on which to base the priors for the 
real data analysis, as follows. First, we take pi ~ Beta(100po,i, 100(1 — po,*))? 
independently, where p^ i = (a^(n + 1) — l)/n and ai is the approximate posterior 
mean of the IW-AR autoregressive parameter from the hold-out data analysis of 
Uutf second, we set v$ = q + 2 and Vb = (1 • 1' — po ■ Po) ° So, where So is the 
sample mean of all of the the U\ : t- 

Although centered around a held-out-data-informed mean, the chosen Wishart 
prior for S is quite diffuse and the beta priors for the pi are weakly informative 
relative to the number of observations T = 1000. Our use of initial hold-out 
data to specify priors is coherent and consistent with common practice in other 
areas of Bayesian forecasting and dynamic modeling such as in using factor mod- 
els; Aguilar and West (2000), for example, adopt such an approach and give useful 
discussion of the importance of centering hyperprior support around "reasonable" 
values for these types of dynamic models. 

From an identical analysis on the batch of test data, we infer values p\ and 
Vi that are used in specifying the Beta(dpi 5 i, d(l — pi,i)) proposal for pi and 
W(vi, Vi) proposal for V used in our MCMC algorithm. After some experimenta- 
tion, this used tuning parameters d = 750 and v\ = 40. The FFBS proposals also 
rely on defining the moment-matched IW degree of freedom parameters ro-.r for 
which we set ro = n + 2, which matches the prior specification, and then discount 
as r t = 0.98rt-i + 1. Also, in employing the approximate innovations-based sam- 
pler described in Section 5.1, the analysis monitors based on | |HJl r — St- T | b < e 
and uses e = le — 4. 

Some summaries of analysis are based on running 5 separate MCMC chains for 
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5000 iterations, discarding the first 1000 samples of each and thinning by examin- 
ing every 10th MCMC iteration. Note that we count one full sweep of side-by-side 
innovations based FFBS of Ho:T as one step in an iteration. The sampler was ini- 
tialized with F and S based on the mean of their respective proposal distributions 
and the residuals x\ : t computed from q separate univariate TVAR analyses. The 
sequences Ho : t, Ti : t, and ^\ : t were initialized by directly accepting the first pro- 
posal from one step of the FFBS algorithm. 

Figure 4 displays volatility trajectories for each of the 10 examined EEG chan- 
nels showing clear changes in volatility over the 8 seconds of data, while related 
temporal structure in cross-series covariances is evident in Figure 5. These changes 
are also captured in Figure 6, which display the time- varying correlations between 
the EEG channel AR innovations. For the model parameters, Figure 7 shows clear 
evidence of learning via changes from prior to posterior summaries for the pi 
and Sa elements; this figure also highlights the high dependence in the IW-AR(l) 
model and heterogeneity across EEG channels. 

7. IW-AR(2) and Higher Order Models. The constructive approach for IW- 
AR(1) models extends to higher orders. This can be done in a number of ways, as 
follows. 

7.1. Direct Extension. For any order p > 1, transition distributions of IW- 
AR(p) processes can be defined by the conditionals of q x q diagonal blocks of 
underlying inverse Wishart distributions for (p + l)q x (p + l)q matrices. This 
involves a direct extension of the basic idea underlying the IW-AR(l) model con- 
struction. We develop this here for the case of p = 2. 

With p = 2, begin with 

(7.1) ( J^l 1 At - lF * ) = ( &_! Et_i # ) ~ IW 3q (n + 2, nS 3 



r t A t _i s 



where 



(7.2) 



\ 


(Zt-2 Cl 


= 


4>t-i Et-i 


/ 


\ It <k 




(S G' H' 


-5 3 = 


\G S G' 




\H G S 



Then, for all t, 

(7.3) A t ~IW 2q (n + 2,nS 2 ) with S 2 

(7.4) V t ~IW q (n + 2,nS). 



S a 

G S 
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FIG 4. EEG time series from 2 of the 10 channels (first two frames) followed by estimated trajectories 

1/2 

of volatilities E J t for each ofi = 1:10 time series representing EEG channels 7-16 in the original 
data set. The IW-AR posterior mean, computed based on averaging over 5 chains from iterations 
[1000 : 10 : 5000], is shown in black. The point-wise 95% highest posterior density intervals are 
indicated in blue. 
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Fig 5. Estimated trajectories of covariance terms Tiij^fori ^ j for j = 1 (EEG channel 7) colored 
as in Figure 4. 
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FIG 6. Estimated trajectories of correlations between each of 6 channels and all other channels as 
a function of time. The correlations are computed based on posterior means of St using MCMC 
samples [1000 : 10 : 5000] from 5 chains. 
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FlG 7. Box plots of the posterior samples of pi and Su, respectively, based on the same MCMC 
samples as in Figure 4. The prior means are marked by green triangles. 
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This then constructively defines a stationary order 2 process with common bivariate 
and univariate margins. In contrast with the IW-AR(l) construction of Section 2, 
{(/>£, 7t} in eqn. (7.1) are not independent over time. Rather, if T t = [r^, ^t], 
then we have defined an autoregressive process on the augmented variance ele- 
ments: 

(7.5) 7t = r tl E t _ 2 + r t2 ^_i 

(7.6) ^ = r t2 Et-i + r t i^_ 1 . 

The "memory" induced by these off-diagonal elements is evident as the full con- 
ditional distribution for H^ is p(Z t | A t _i), whereas the IW-AR(2) observation 
model is piT^t | St-i:t-2) which involves marginalization over the relevant condi- 
tional for the off-diagonal matrices. 

As in the case of the IW-AR(l), the construction of eqn. (7.1) implies that 

(7.7) ^ t = n t + T t A t . 1 T f t , 

with time t innovation matrices T t (q x 2q) and Q t (q x q) independent of A^-i 
and distributed as 



(7.8) Q t ~ IW q ( n + 2 + 2g, n LS - [if G] S 2 _1 [if G] ' 

(7.9) T t \Sk~N{[H G]S^\n u (nS 2 )- 1 ). 

If we assume that H = GS~ 1 G such that 
(7.10) 

Ss=( G °S GS G' G )=( lQ * *[0 G5^r 

VGS'-^ G 5 y V[0 GS j5 2 5 

then 

(7.11) fi t ~ JW, (n + 2 + 2g, n (S - GS^G')) 

(7.12) r t |fi t ~iV([0 G5- 1 ],fi t ,(n5 2 )- 1 ). 

Furthermore, taking G = F5 leads to 

«■»> *=(™ X ?) = ([.?!«, * [ °5 F] ') 



and 



(7.14) fi t ~ /^(ni 2 + 2g,n (5 - FSF')) 

(7.15) r t |a~JV([0 ^.^(nSa)" 1 ). 
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Note that for the specified IWs q to be a valid distribution, we need S3 positive 
definite. As before, this is equivalent to S2 and the Schur complement of S3 being 
positive definite. Taking G = FS and H = F 2 S, the Schur complement is simply 
S - FSF' and 

(7.16) |S 3 | = \S 2 \\S-FSF f \ = \S\\S-FSF f \ 2 . 

So, just as in the IW-AR(l), we require S and V = S — FSF' to be positive 
definite; the conditions for a valid process and stationarity have not changed in this 
extension to the IW-AR(2) process based on the chosen parameterization. 

More general IW-AR(p) follow from the obvious extension of this constructive 
approach. Note that the ancillary off-diagonal blocks of the extended IW^ p+ ^ q 
matrix defining the IW-AR(p) transition distributions are latent variables that will 
feature in Bayesian fitting. 

7.2. A Second Constructive Approach to Higher-Order Models. A related, al- 
ternative and novel approach is defined by coupling AR components to generate 
higher order AR structures. Specifically, take St = \&t + TjEt-iTj with p(Tj, \E^) 
having a marginal matrix normal, inverse Wishart form as in the IW-AR(l) model 
of Section 2. Denote the hyperparameters of this IW-AR(l) by fi\ = {n, F, S}. 

Now introduce Markovian dependence into the ^ t sequence while maintaining 
the same conditional independence of (T t \^ t ) on the history of the Y, t process. 
Specifically, take an IW-AR(l) model for ^ t so that for each t 

(7.17) % = St + *t*t-i*t> 

with time t innovations {$t 5 ^t} having independent matrix normal, inverse Wishart 
distributions with defining parameters ^2 = {n + g, i7, V }, where V = S — FSF' . 
This induces a second-order Markov model 

(7.18) S t = TtSt-iX; + *tSt-i*t " ^Tt-iSt-sT^^; + S t . 

Stationarity of this IW-AR(2) process is implied by simply ensuring the station- 
arity of the IW-AR(l) St process and the embedded IW-AR(l) ^ t process: each of 
S, V and now W = V — HVH' must be positive definite. 

Conditional on At_i earlier defined, we have 

(7.19) £[St I At_i] = FSt-iF^ 

{H^t^H' + Cn^W + Cn^Wtr^t-^nV)- 1 )} (l + tipt-iinS)- 1 )) , 
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where c n ^ q = n/(n + 2q) and ^t-i = ^t-i — Tt-i^t-2^t-i * s a deterministic 
function of the elements of Aj-i- In the limit as n —> oo, 

£?[Et | A t _i] = W + FE f _iF' + #* t _i#' 
= S + F(Z t - 1 -S)F f + 

ff((Et_i - 5) - (Tt-iEt^Tj.! - FSF , ))H'. 

Derivations of these conditional moments are provided in the Appendix. 

The new structure of joint distributions of the innovations {Y^,^} is to be 
explored, as are extensions of the MCMC for model fitting. 

8. Related Models. 

A Markov Latent Variable Construction. As discussed in Section 2.3, our IW- 
AR(1) model in q = 1 dimensions relates closely to the univariate model arising 
via a latent variable construction introduced by Pitt and Walker (2005); Pitt et al. 
(2002). We can extend the univariate model of that reference to the multivariate 
case, as follows. The Ei : t process is coupled with a latent q x q variance matrix 
process Ai : t via time t conditionals: (Z t \At) ~ 7H / g (n+2+a, nS+aBk t B') with 
(At|St_i) ~ W q (a, AE t -iA f /a) a Wishart conditional for some a > and non- 
singular q x q matrix A = B~ x . It can be shown that this latent variable construction 
defines a valid joint distribution with margin T, t ~ IW q (n + 2, nS) for all t. This 
leads to an AR(1) transition modelp(Et|St-i) in closed form and appears to be the 
most general AR(1) construction based on the latent variable/process idea of Pitt 
and Walker (2005). 

Although producing identical margins to the IW-AR(l), the proposed multivari- 
ate extension of the Pitt and Walker (2005) construction is limited. Such mod- 
els are always reversible. Most critically, the construction implies E(Et\T,t-i) = 
S + w(Yi t -i — S), where w = a/(n + a) is scalar. So, in contrast to the F matrix 
of the IW-AR(l), there is no notion of multiple autoregressive coefficients for flex- 
ible autocorrelation structures on the elements of T, t . Finally, it is not clear how to 
extend to higher-order autoregressive models. 

Direct Specification of Transition Distributions. The interesting class of models 
of Philipov and Glickman (2006a) specifies the transition distribution for T, t given 
St-i as inverse Wishart, discounting information from the previous matrix. Specif- 
ically, the conditional mean is given by cFY^_ x F' for some c, d > and matrix 
F. The Markov construction generates models with stationary structure. Scaling 
to higher dimensions, the authors apply the proposed stationary Wishart models to 
the variance matrix of a lower-dimensional latent factor in a latent factor volatility 
model (Philipov and Glickman, 2006b), extending prior approaches based on dy- 
namic latent factor models (Aguilar and West, 2000; Aguilar et al., 1999). Based on 
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the specification of Wishart Markov transition kernels, the proposed models do not 
yield a clear marginal structure and extensions to higher dimensions appear chal- 
lenging. Furthermore, the proposed sampling-based model fitting strategy yields 
low acceptance rates in moderate dimensions (e.g., q = 12). 

Related approaches in Gourieroux et al. (2009) define Wishart (and non-central 
Wishart) processes via functions of sample variance matrices of a collection of 
latent vector autoregressive processes. Specifically, when the degree of freedom 
n is integer, Y> t = J2k=i x ^t x 'hv w ^ eac ^ x k independently defined via x^t = 
Mxkj-i + e k,t an d e k,t ~ N(0, Ho). For stationary autoregressions, one can ana- 
lyze the marginal distribution of £j. Extensions to higher order processes are also 
presented. For model fitting, the authors rely on a (non-asymptotically efficient) 
method of moments assuming that a sequence of observed volatility/co-volatility 
matrices are available. Extensions to embedding the proposed Wishart autoregres- 
sive process within a standard stochastic volatility framework is computationally 
complicated: a mean model can be estimated based on nonlinear filtering approx- 
imations of latent volatilities. Within Bayesian analysis of such a setup, the non- 
central Wishart does not yield an analytic posterior distribution and is challenging 
to sample. One might be able to exploit latent process constructions, but the anal- 
ysis is not straightforward. 

AR models for Cholesky elements. Several recent works use linear, normal AR(1) 
models for off-diagonal elements of the Cholesky of T, t an d f° r the log-diagonal el- 
ements (Cogley and Sargent, 2005; Lopes et al., 2010b; Nakajima and West, 201 1; 
Primiceri, 2005), building on the Cholesky-based heteroscedastic model of Pourah- 
madi (1999), and a natural parallel of Bayesian factor models for multivariate 
volatility (Aguilar and West, 2000; Aguilar et al., 1999). However, each autore- 
gression has an interpretation as the time-varying regression parameters in a model 
in which the ordering of the elements of the observation vector is required and 
plays a key role in model formulation. For models in which this is not the case, the 
parameters employed in the autoregressions are less interpretable. We can cast our 
IW-AR within a similar framework. The inverse Wishart margins for T, t and E$-i 
translate to Wishart margins for the precision matrices S^ 1 and S^_\. Since each 
qxq Wishart matrix can be equivalently described via an outer product of a collec- 
tion of q x q identically distributed normal random variables, our IW-AR implicitly 
arises from a first-order Markov process on the normal random variables and thus 
defines a Gaussian autoregression, though possibly of a nonlinear form. Note that 
there are a few key differences between the IW-AR induced element-wise autore- 
gressions and the Cholesky component AR models: (i) the IW-AR autoregressions 
are on elements of the precision matrix and (ii) these elements comprise a matrix 
square root, but not the Cholesky square root. The issue of implicitly defining an 
ordering of observations when using a Cholesky decomposition is not present in 
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the matrix square root considered in the IW-AR case. 

9. Final Comments. The structure of the proposed IW-AR processes imme- 
diately open possibilities for examining alternative computational methods and ex- 
tensions to parsimonious modeling of higher-dimensional time series. 

The inherent state-space structure of the IW-AR also suggests opportunity to 
develop more effective computational methods using some variant of particle fil- 
tering and particle learning (Carvalho et al., 2010; Lopes et al., 2010a). Among 
the main challenges here is that of including the fixed parameters - or expanded 
state variables that include approximate sufficient statistics for these parameters - 
in particulate representations of filtering distributions (Liu and West, 2001). One 
possible approach is to harness ideas from particle MCMC (Andrieu et al., 2010). 
Otherwise, the new IW-AR model class is inherently well-suited to the most effec- 
tive reweight/resample strategies of particle learning for sequential Monte Carlo. 

The inverse Wishart distribution also has extensions to hyper-inverse Wishart 
(HIW) distributions for variance matrices constrained by specified graphical mod- 
els (Carvalho et al., 2007; Dawid and Lauritzen, 1993). Graphical models provide 
scalable structuring for higher-dimensional problems, and it would be interesting 
to consider extensions of the IW-AR to HIW-AR processes that evolve maintaining 
the sparsity structure (of the precision matrix) specified by a graphical model. 

APPENDIX A: PROOFS OF THEORETICAL PROPERTIES 

A.l. Proof of Theorem 3.3. Let v tk denote the fcth row of T^ and fk the fcth 
column of F. Then, for the IW-AR(l) we can write the (z, j) element of E^ as 

(A.l) ^ t ,ij = ^t,ij+vt i ^t-iv f tj - 

Taking the expectation conditioned on £t_i, 

V q q 

E [St.y | E*_i] = ^f- + E E ^t-iME[vt tk v tje ] 

n + q fc =i e=i 



nVi 



q q 



+ EE Ef - i .« 



n + q 

H k=i e=i 



n + q K 



+ FikFjf, 
H 



(A.2) = ^ + tr (x t _ 1 ^L(nS)- 1 )+f i X t - 1 f' j 

n+ q \ n+ q J J 

where we have used the fact that E[^ t ] = nV/(n + q). In matrix form, we have 

(A3) m | E,_J = (! + »(*-.(■*)-')) „„ + FE( _ lF - 



30 E. FOX AND M. WEST 

A.2. Proof of Theorem 3.4. For the case of F = ERE' and S = EQE', we 
can write eqn. (2.3) as 



(A.4) 

St_! S t _iT{ \ TW ( n , t>n ( E \ / Q QR\(E> 



Standard theory implies that 

(A.5) 

E' \ / Et_i E t _iTj \ / E \ TW / ^ . ( Q QR 

The derivation of the conditional mean is exactly as in the general IW-AR case, 
noting that tr^-^nQ)" 1 ) = £iEt-i,«/(n&). 

A.3. Proof of Theorem 3.5. Assume that H^Hoo < A, ||S' _1 || (X) < A, ||F||oo < 

A, ||So||oo < A, and for some t — 1 



rrf-lv crf-1' i n fo rt-lcpt-1'^ _/l-l 



(A.6) 

£7[E t _] I S ] - F'-^oF 1 - 1 ' + ^— ( 5 - F'-'SF 1 - 1 ' ) + O 

n + g V / \ n 

To prove that eqn. (A.6) holds for general t, we apply iterated expectations to the 
conditional expectation of eqn. (3.5): 

(A.7) 

Efo I So] = FE\E t -i I X ]F' + -^-V + -^-tr (£[E t _i | So]^ 1 ) 

V_ 

q \ J n + g 

where we have used the definition V = S — FSF f . Since 



(A.8) = F%F* + -^— (S - F'SF 1 ) + tr (#[E t _i I Sq]^ 1 ) , 



(A.9) -^£r (£[E t -i I Eq]^ 1 ) 

we conclude that, indeed, 

(A.10) ME t I S ] = F%F* + — ^— (s - F^SF**) + O ( — ] • 

n+ q V / \ n J 

Then also 

(A.ll) lim £7[E* | S ] = S + F*(E - S)F'K 
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A.4. Proof of Theorem 3.6. Let A = Q(I - R 2 ) such that V = EKE'. 
According to eqn. (3.5), E[Si | So] then shares the same eigenspace as So (i.e., 
the eigenvectors are given by E), and by induction, so does E[S t | So] for all t. Let 
© t | denote the diagonal matrix of eigenvalues of E[S t | So]. Since tr(S£ -1 ) = 
tr(6Q _1 ), eqn. (3.5) can be rewritten solely in terms of the eigenvalues: 



(A.12) 



©t|0 = 



A 



n 



n + q 



tr (e^noQ" 1 ) + ——A + RS t _ ll0 R f . 



n + q 



In terms of the vectors of eigenvalues 9 t \ = diag (©t|o)» £ = diag(Q) and £_i 
diag (Q _1 ) we have 



(A.13) 
Letting a 

(A. 14) 



n 



H\Q 



n+q 



n + q 
(I - R 2 )£ and B 



-{I-R 2 )iCi + R 2 



.,,--, ^t— 110- 

n + q J ' 

i-(7 - i? 2 )CC-i + # 2 > we conclude that 



n+g 



t-1 



t | O = 5 t e o + X) 5Ta - 



r=0 



Since £> represents a matrix (strictly) convex combination of ^r 
imum eigenvalue of £> is bounded by 



and /, the max- 



(A.15) 



(I - R 2 ) max < eig 



n + q 



1 + ir • 1 



Here, max{eig(A)} denotes the maximum eigenvalue of A. The term ££'_i is a 
rank 1 matrix implying that the only non-zero eigenvalue is equal to tr(^^_ 1 ) = q. 
Thus, regardless of n, B has eigenvalues with modulus strictly less than 1 since 

^p has q — 1 eigenvalues equal to and one equal to ^— < 1. This implies that 
the conditional mean of the process forgets the initial condition Sq exponentially 
fast regardless of n. Furthermore, since the eigenvalues of B have modulus less 
than 1, 



(A.16) 



9 tl0 = B% + (I-B)- 1 (I-B t )a, 
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implying that, as expected, the eigenvalues of the limiting conditional mean are 
exactly those of the marginal mean S: 



(l-R 2 )ti 



(A. 17) 


lim e t]Q 


= {I-B)~ 1 a 




(A. 18) 




= 


{{■-f;\) < 1 -^- 1 } 


n 


n + q 


(A. 19) 




n + q \ n + qj 




(A.20) 




= t 







The last equality follows from matrix inversion and the fact that £_-[£ = q. 

APPENDIX B: DERIVATION OF FORWARD FILTERING BACKWARD 

SAMPLING ALGORITHM 

B.l. Approximate Forward Filtering. The inverse Wishart prior on Ai can 
be analytically updated to an inverse Wishart posterior conditioned on y\ : 



(B 



,1) p(Ai | i/i) = IW 2g L + 3, n ( p S S g )+ ViVi 



To propagate to t = 2, we use the Chapman-Kolmogorov equation, integrating 
over Ai: 

p(A 2 | yi ) oc / p(A 2 | Ai)p(Ai | yi)dA 1 

°t / 5 Ei=^ 1 +T 1 E T / 1 P(*2)p(T 2 | *2)p(Ai | yi)dTid*idS 

oc IW g (* 2 I n + q + 2, nV)N(T 2 | F, * 2 , (nS)' 1 ) 
(B.2) IW g (Ei | n + 3, nS + x lX [). 

Here, we have used the fact that the transition kernel p(A 2 | Ai) simply involves 
independent innovations {T2, ^2} and deterministically computing Si. Integrat- 
ing the elements used to compute Si (a component of A2), the marginal posterior 
can be derived from the joint posterior of the augmented variance matrix at time t 
given in eqn. (B.l). Although an independent normal-inverse Wishart set of random 
variables can be combined with a g-dimensional inverse Wishart matrix to form a 
2g-dimensional inverse Wishart, as discussed in Section 2, there are restrictions 
on the parameterizations of these respective distributions. The set of distributions 
specified in eqn. (B.2) do not satisfy these constraints, and thus do not combine to 
form a 2g-dimensional inverse Wishart distribution on A2. Namely, the ^ 2 prior 
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and Hi posterior degrees of freedom do not match, nor do the Hi posterior scale 
matrix and the T2 prior variance term (nS) -1 . 

Since exact, analytic forward filtering is not possible, we instead approximate 
the propagate step with a moment-matched inverse Wishart distribution. That is, 

p(A 2 \ yi )^IW(n + 2,nE[A 2 \ Vl \) 
(B.3) ±g 2ll (A 2 \ yi ). 

Based on the approximations made in propagating, the subsequent update step is 
exact due to the conjugacy of the Gaussian observation and inverse Wishart predic- 
tive distribution. 

In general, we can choose an arbitrary degree of freedom parameter in our ap- 
proximate forward filtering. Assume that at time t we use r t degrees of freedom 
for the moment-matched approximation gt\t-i(^t | Vi-.t-i) to the predictive distri- 
bution p(At I yi:t-i). We use gt\t(^t | yi-.t) to denote the resulting approximation 
to the updated posterior p(A t | yi-.t). 

We initialize at t = 1 with r\ = n + 2 and 
(B.4) 

2i|o(Ai) = p(Ai) = IW(n, (n - 2)E[A 1 ]) i E[A ± ] = ( ^ S £ ) , 

5i|i(Ai I yi) = p(Ai I yi) = IW(n + 1, (n - 2)£7[Ai] + 2/12/1)- 

Propagating forward, 

<fen(A 2 I yi) = IW(r 2 , (r 2 - 2)£7 fll|1 [A 2 | 3/1]), 
52| 2 (A 2 I s/i: 2 ) = IW(r 2 + 1, (r 2 - 2)E gi{1 [A 2 | yi] + y 2 y 2 ). 

Here, the predictive mean is derived as 

(B.6) i^JAa I 1/1] = (^ F ^ ps^ + ^+^s^nS)- 1 )) J ' 

with 5i = ^.JSi I yi] = (ri -y . The term S^jEa | Vl ] is derived 
via iterated expectations, namely E gill [E\Y\ 2 | Si, 7/1]], and employing eqn. (3.5) 
noting that S2 is conditionally independent of y\ given Si. 
The forward filter then recursively defines 



(B.7) 



9t\t-i(&t I yv.t-i) = IW (r t , {r t - 2)E gt _ 1]t _ 1 [A t | y 1:t _!]) , 

g t \ t (A t I yi:t) = IW (r t + 1, (r* - 2)E 9t _ 1]t _ 1 [A t \ yi-.t-i] + y t y' t ) , 
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(B.8) Eg^.^ [A t | yi:t_i] 



St- 



St-iF' 



FSt-i FSt-iF' + -*£-(! + tr^^nS)" 1 )) 



n+g v 



and 



(B.9) 



(r t - 2) (Fft_iF' + ^(1 + triSt-iinS)- 1 ))) + ^ 



n 



B.2. Backward Sampling. The density required for backward sampling is the 
posterior of {St-i, Tj, *t} conditioned on E* and yi^, which can be written as 

(B.10) p(£ t -i, T tj * t I £*, 2/i :T ) = p(Et-i, T t , tt t | E t , y 1:t ) 

(B.H) OC p(E t _l, T t , *t | J/l:t)5Et=^t+TtE t _iT{ 

( B - 12 ) O^ 5t|t(At | 2/l:t)*S t =^ t +T t E t -iT{- 

Thus, sampling {E$_i, T t , \Pt} f rom this conditional posterior is equivalent to fix- 
ing Y> t in the A^ matrix and sampling a valid {X^-i, T t , \E^} conditioned on E^ 
based on the forward filtering distribution g t \t(^t \ Hi:t)> By valid, we mean a 
value that corresponds to Y> t = ^ t + T^E^-i Y£. 
Based on eqn. (5.5), 



(B.13) g t{t 
implying that 
(B.14) g t \ t 



TtEt-i St 



St TtEt-i 
St-iTi Et_i 



Vv.t 



iw r t + 1, 



G, 11 G t 21 ' 

/o21 r^22 



yi:t 



iw n + 1, 






21 



C2L /Ol 



11 



^22 



Here, Gt is the forward filtering term defined in eqn. (5.11), with G\ , G\ , G 
denoting the three unique gxg sub-blocks (G] 2 = G 21 ). The form of eqn. (B.14) 
allows us to use the previously discussed properties of the inverse Wishart distribu- 
tion to sample {Ej-i , T t , ^t} conditioned on E t and yi :t . Specifically, as discussed 
in Section 2, there exists a {Tt, S&t} such that TtEj = E^-iTj and 



(B.15) 



*t I W:t - IW(r t + 1 + ?, G, 11 - Gf\Gf )- 1 G? 1 ), 
T* | * t ,y 1:t ~iV(Gf (Gf^-S^^Gf)" 1 ) 
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with 

(B.16) Et_i = * t + f t Etf{. 

Thus, to sample T>t-i conditioned on T, t from the approximation to p(£t_i | 
S t , yi:t), we first sample {T t , \E^} as specified in eqn. (B.15) and then compute 
St-i based on eqn. (B.16). 

APPENDIX C: MULTIVARIATE T DISTRIBUTION 

The g-dimensional multivariate t distribution with v degrees of freedom and 
parameters ji and S has density 



(C.l) t u (x | /i, E/i/) = a^lSr 1 / 2 M + i(x - /i/lT 1 ^ - aO ) 



-(i/+g)/2 



where 

r((i/ + g )/2) 

For the proposed IW-AR, since T^ | \&t ~ iV(F, ^ 3 (^SO -1 ) standard theory gives 
Tt^t | ^t, ^ ~ N(Fz tl ^t(z , t (nS)~ 1 z t )). Marginalizing over T^ in eqn. (4.1), we 
have 

(C.2) p(x t | ^, * t ) = N(Fzt, %(1 + 4(n5)-U)). 

Now, ^ t ~ IW(n + g + 2,nV) implies * t (l + z^nS)' 1 z t ) \ z t ~ IW(n + 
q + 2, nF(l + ^(n5) _1 zt)) Marginalizing \I>£ from the distribution of eqn. (C.2) 
yields the t distribution with density 

(C.3) p(x t | z t ) = t n+q+2 [Fzt, (1 + z'tinS^zt) ^ ) . 

The marginal likelihood of F and S given 2i : t follows immediately. 

APPENDIX D: CONDITIONAL MEAN OF IW-AR(2) MODEL 
For the IW-AR(2) process in Section 7.2 we have 

(D.l) E[* t | * t -i] = #**-!#' + -^r (1 + tr^t-iinV)- 1 )) . 

n + 2q 
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Since E[Xt I A t _i] = E[V t I A t _i] + E[T t E t -iT' t | A*_i], in deriving the 
conditional mean of E t given A t _i we first need 
(D.2) 
^[[TtEt-iT^ | A t _i] = E[E[[?tXt-irt]ij I **] I At-i] 

= E§2x t -i M E[r ttik r t>je | * t ] | A t _i] 

k,l 

= E[J2 Zt-i,ki {[^ijinS)- 1 ]^ + F ik F je ) | A t _i] 

k,l 

= E^t^inS)- 1 )^ + i^E^i* | A^] 
= trCZt-iinSr^EiVw | A t _i] + F^^Fj., 

implying 

(D.3) E[r t ^t-iT' t | A t _i] = trCSt-iCnS)- 1 )^^ | A t _i] + F^t-iF'. 

Noting that E[% I A t _i] = E[E[% | * t _i] | A t _i] and £[* t _i | A t _i] = 
Vl>t-i since *t-i is a deterministic function of the elements of A t _i, eqn. (7.19) 
follows directly. That is, 

E[E t | A t _i] = FXt-iF' + E[% I *t-i](l + trCEt-i^)" 1 )) 

with £[* t | * t _i] as in eqn. (D.l). 
In the limit as n — > oo, we have 

E[X t | A t _i] = W + FE^F' + H*t-iH' 

= V- HVH' + FXt-tF' + if (Et_i - iiTi-iS^T^if' 
= 5 - FSF' + FE t _iF' 

+ ff ((St_i - Tt-iEt-aTj.!) - (5 - FSF'))^ 
= 5 + F(Et_i - 5)F' 

+ ff ((E t _i - 5) - (T i _ 1 E i _ 2 T;_ 1 - FSF'))H'. 

APPENDIX E: SAMPLING FOR STOCHASTIC VOLATILITY IN TIME 
SERIES MODELS WITH IW-AR(l) COMPONENTS 

In eqn. (6.1), the conditional posterior of the autoregressive parameters A (i.e., 
Step of the sampler) is given as follows. 

Step 0. Sample the observation autoregressive parameter A given Ai : r and £i- r: r- 
Assume Ai diagonal defined by the g-vector a* = diag(-Aj). The autoregressive 
process of eqn. (6.1) can be equivalently represented as 

(E.l) &=[diag(&_i) ••• diag(6-r)] [a[ ■■■ a' r ]' + x t . 
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Under a prior [a[ • • • a! r ~\ ~ N{fi ai S a ), standard theory yields the conditional 
for [a 7 x • • • a f r ] | So:T,Ci-r:T as multivariate normal with easily computed 
moments. For t = 1, . . . , T, set x t = & - DI=i ^6-i- 
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